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1 . Introduction . 

The Houston Integrated Spatial/Spectral Estimator (HISSE) is a statistical 
estimation procedure based on a normal mixture model which is designed to take 
advantage of spatial associations of LANDSAT data pixels produced by an auto- 
mated spatial/spectral clustering algorithm. The clustering algorithm used in 
this experiment is the AMOEBA algorithm developed at Texas A & M University, 
which is based on the three assumptions listed below Cl]. AMOEBA detects 
spatially connected sets of LANDSAT pixels, called patches , whose elements 
are characterized by spectral similarity, within certian tolerances, to their 
neighbors. 

Assumption 1 : Real classes exist. 

Assumption 2 : Each patch contains pixels from one and only one 

real class. 

Assumption 3 : Each real class is represented by at least one patch. 

No absolute commitment to the agricultural nature of real classes is 
expressed in [1]; however, there is an indication of a high degree of purity 
of patches with respect to ground truth labels when AMOEBA patches are plotted 
on ground truth maps. A more complete study, with the same conclusion, is 
reported in [5]. Therefore, we feel justified in identifying the real classes 
with ground truth labels. In addition to the three assumptions just given, 



HISSE requires the following assumption. 


Assumption 4 : The data from each patch is normally distributed with 

mean and covariance depending only on the class to 
which it belongs. 


Assumption 4 has been challenged, some might say refuted, in [2]. 

However, we take the position that the proper question to ask is whether 
assumption 4 is close enough to the truth to be useful in estimating class 
proportions and labeling classes with ground truth labels. The clustering 
portion of AMOEBA may be described as a k-means algorithm which respects patch 
integrity (see Assumption 2) with a novel way of determining the correct number 
of clusters. As such, it contains no way of compensating for the confusion 
arising from classes with overlapping spectral characteristics. Thus, 

Assumption 4 may be regarded as a step toward mitigating the error in proportion 
estimation which is unavoidable with the classify and count method. Henceforth, 
pixels contained in patches will be called pure pixels, and all others boundary 
pixels. 


2. Mathematical Description . 

It is assumed that there are m real classes, labelled 1, •••, m, and p 

patches represented by independent random vectors (X^,0^), •••, (X^O^) where 

0. e { 1 , * • • ,m} is the unknown real class to which patch j belongs and 
J 

X j = (Xji , •• • > X jNj) 1S a set n-vectors representing the spectral data 

from the jjth patch. The 0. are i.i.d. with = Prob[0.=£] unknown and, 

given that 0. = £, X. is a random sample from an n-variate normal distribution 
J J 

unknown mean and covariance. Notice that is the expected 



fraction of patches belonging to class i and for a given scene may be 
quite different from the fraction of pure pixels belonging to class £, 
which we denote by <p^. The random variable <p^ is directly related to 
the total acreage of the patches belonging to class £. 

The log likelihood function for the parameters is 


i) 


L - £109 f(Xj) 


where 

2 ) 


m 

f(x j } = 


and f (X.) is the N.-fold product normal density 
^ J ♦ J 
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Despite the apparent complexity of L, it depends on the data only through 
the patch means 
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and scatter matrices 
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Once the m.'s and S - 1 s are computed and stored, HISSE has no further 

J J 

need for the pure data. 



The numerical procedure used in HISSE for finding a maximum of the 
likelihood function is defined by iteratively substituting into the likelihood 


equations, viz. 


( 6 ) 
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where R. = S. + N.m.m. is the noncentral scatter of the jth patch. The values 
J J J J J f (X.) ~ 

of the parameters used in evaluating the ratios are those at the preceding 

kth step of the algorithm. It is shown in [6] that there is a unique strongly 

consistent solution of the likelihood equations in a neighborhood of the true 

parameters as p -*■ °° and that the iteration procedure ( 6 ) - ( 8 ) converges to the 

consistent solution if the starting values are near it. 

Let N = N-j .+ ■••+ Np be the total number of pure pixels. It is easy to 

1 P 2 

show that CL<p 0 J - a and var(<p ) < — ~ .IN.. Thus, if the patches are nearly 
x x, x. 4^ j-i j 

uniform in size, the MLE of can be used as a predictor of However, the 

least MSE predictor of <p^ based on the observed data (assuming that the para- 
meters are known) is 


8o 
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9) 



Therefore, we take 8^ evaluated with the maximum likelihood estimates of 
the parameters as our estimate of 

In processing the boundary pixels, which typically constitute 60-70% of the 
scene, we assume that the boundary data consist of an independent sample from 
a mixture 

,0) j, s A<vV 

where the component normal distributions are the same class distributions 
represented in the pure data, plus observations from a contaminant class 
(possibly corresponding to the "not in field" ground truth label) in the tails 
of the N (y ). In other words, we assume that a boundary observation 

1 1 X/ X/ 

which is spectrally unlike all of the pure classes is much more likely to be 
from the contaminating class than an outlier from one of the pure classes. 
Therefore we classify as a contaminant each boundary observation X which 
satisfies 

11 > < x A>V<>‘- 1, )t ) * *a 

for all i = 1, •••, m, where the p^'s and ft^'s are the previously estimated 

2 

pure data class means and covariances and y is a size a critical value 

A a 

2 

for x with n degrees of freedom. In this experiment we chose a = .1. 

Let Yj, ••*, denote the boundary observations remaining after rejecting 
those classified as contaminants. We treat Y^ , • • • , Y^ as an independent sample 
from the mixture density (10), with unknown mixing proportions ct^ , 


£ 



but known components N„(p„ ,f2„ ) , and obtain a MLE of a , , •••,(*_ by successively 

n Xj X/ i rn 

substituting into (6). Obviously, Y-j, Y^ is, at best, a truncated sample 
from the mixture (10), so that the MLE of a-j , • • • , a is asymptotical ly biased. 
We do not expect this effect to be a reason for serious concern. After obtaining 
the MLE for a,, •••', a^, we use as our final estimate of the number of pixels 
corresponding to class £, the quantity N0 £ + Moc^, where 6^ is given by (9). 

3. Implementation . 

The number of classes assumed in this experiment is determined by AMOEBA 

subroutines PAINT and CLASFY. PAINT produces the pure/boundary division of 

a $ x 6 mile LACIE segment, an array LABELS containing a patch description for 

each of the pure pixel locations, and a map of the scene showing the pure and 

boundary pixels. CLASFY produces an array CLASS containing the final cluster 

designation of each of the patches. A subroutine STAT2 has been attached to 

AMOEBA which calculates and saves patch sizes ( N - ) , patch means (m.) and 

J J 

noncentral patch scatters (R-). These statistics are then passed to STAT3 
which uses the CLASS array to compute the fraction (a°) of patches assigned 
to each cluster, the fraction of pure pixels assigned to each cluster, and cluster 
means (y°) and covariances (Q°) for the pure data only. These cluster 
statistics are used as initial estimates of the parameters for the iteration 
procedure described by (6)-(8). CLASFY occasionally produces a cluster with 
such a small number of pure pixels that an initial covariance estimate cannot be 
calculated. In this case the initial in HISSE is obtained by averaging 

the cluster sample covariance with a multiple of the identity so as to insure that 
the condition number of is no greater than 16. 


6 



After initialization HISSE produces iterative estimates 
of the parameters until a convergence criterion is satisfied, after which the 
estimates 8^ are computed in the manner described in Section 2 and stored. 

The boundary pixels are identified from the LABELS array output by AMOEBA. 

For each one, the quadratic forms (x-y^) are computed and tested 

against the threshold value of x a > as in (11)- For those boundary pixels not 
rejected by the thresholding procedure, the likelihood ratios f £ (x)/f k (x) 
are computed and stored in a temporary disc file for use in the iteration 
procedure for estimating , •••, a^. Although the number of boundary pixels 
processed is much greater than the number of patches, the cost is comparable to that 
of processing the pure data because the iteration procedure (6) can be carried 
out simply by accessing the temporary file. 

For the purpose of labeling classes HISSE identifies for each class l, 
the three patches j which have the highest posterior probability 

in that class. The spatial coordinates of pixels in these labeling patches 
are obtained from the LABELS array. Thus, in using HISSE, the analyst would 
be required to make a judgement concerning the identity of each class based on 
his ability to label the labeling patches. 

4. Numerical Resul ts . 

The results tabulated in this section are from four passes over LACIE seqment 
1618 acquired in May, June, August and September of 1976. The data was preprocessed 
by premultiplying each single pass 4-dimensional data vector by the LANDSAT I . 
transformation to brightness-greenness space 

1110 
0 -1 1 1 




a £ f r X j' 
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and stacking the brightness-greenness vectors to obtain 8-dimensional data 
vectors. The results of the AMOEBA run were 7500 pure pixels, organized 
into 310 patches. The number of clusters estimated by NUMCLU was 13. HISSE 
required 19 iterations to estimate the parameters of the pure data mixture 
model. Of the 15290 boundary pixels, the thresholding procedure rejected 5575. 

The number of passes through the remaining 9725 boundary pixels required to 
produce estimates of the boundary mixing proportions a -j , •••, was 8. 

The total cost of running AMOEBA and HISSE together is much less than that of 
running UHMLE or CLASSY on the full scene. 

Figures 1-4 show the scatter plots in brightness-greenness space, correspond- 
ing to each of the passes, of the means of the patches determined by AMOEBA. 
Particularly in the fourth pass, the tasseled cap configuration described in 
[4] is visible. Figures 5, 6, and 7 show the plotted trajectories of the 
estimated class means from pass to pass on the same coordinate system used in the 
4th pass scatter plot. The trajectories of the means of the pure data clusters 
produced by AMOEBA would be nearly indistinguishable. It is interesting that 
the class means trajectories eventually given a small grains label exhibit a 
characteristic triangular shape. Obviously, this characteristic can be used as 
an aid in labeling the classes (see [3.1, for a discussion of this idea). 

Figure 8 tabulates the initial cluster means, cluster variances, and patch 
membership proportions obtained from AMOEBA'S clustering of the pure data. Figure 
9 tabulates class means, variances and patch memebership probabi 1 ities (the a's) 
estimated by HISSE. Figure 10 compares the estimates derived from AMOEBA and 
HISSE of the fraction of pure pixels belonging to each cluster (class). Notice 
that in Figure 10, there is a significant difference between the two estimates, 
particularly in the more populous classes. These classes happen to be the most 



spectrally confused classes. There is also an appreciable difference seen in 
Figures 8 and 9 between the respective estimates of the a's, although the 
difference is not as pronounced. 

Figure 11 shows the AMOEBA boundary map for segment 1618 with the three 
labeling patches corresponding to each class outlined. A ground truth map 
was used to attach ground truth labels to the labeling patches and hence to 
the classes. Most of the classes were given a single ground truth label by 
this procedure. Classes 2, 5, 6, 7, were not assigned a single ground truth 
label and appeared to be made up of more than one type of small grains. However, 
each of these classes was clearly small grains. Class 1 was the only really 
difficult class to label; each of its labeling patches represented small grains 
ground truth labels as well as such labels as beans and fallow. In other words, 
the labeling patches for class 1 were spurious. For the purpose of obtaining 
an aggregate small grains estimate, it was assumed that class 1 was a mixture 
of 1/3 small grains, 1/3 beans, and 1/3 fallow acreage. 

Figure 12 shows the final acreage estimate for each of the 13 classes in 
the mixture model, the acreage of the set C of boundary pixels rejected as 
outliers or contaminants, and the crop labels (including "small grains") assigned 
to each class. The aggregate small grains acreage estimate is 15,288. The 
small grains acreage from the ground truth tape is 15,465, an error of only 1.1%. 
If class 1 is labelled all small grains, the error is 15%. If none of class 1 
is classified small grains, the error is 9.2%. It should be emphasized that the 
problem of labeling cluster #1 from AMOEBA is also serious, since cluster 1 is 
centered near the means of the spurious patches used to label class 1. 

The thresholding of boundary outliers makes a pronounced difference in the 



estimate. The small grains acreage estimate derived from HISSE without 
thresholding would be 19,230, comparable to the estimate of 20,336 derived 
from AMOEBA'S cluster map. 

5. Conclusions . 

The accuracy with which HISSE estimated the small grains acreage in 
segment 1618 was impressive, to say the least, but of course the procedure 
must be tested on other segments for which ground truth is available. Also, 
as we mentioned in Section 4, the accuracy of the estimate depends on the 
classification given to the labeling fields for class 1, the problem class. 

The procedure we used-dividing the class evenly among competing ground truth 
labels - seems fair; however, in an operational situation the class would be 
labeled by an analyst looking at a film product and it seems unlikely that 
he would apportion the class in such a way. In any case, the greatest possible 
relative error was 15%, still a marked improvement over the accuracy obtained 
by labeling AMOEBA'S clusters and counting the cluster assignments, or that 
achieved by HISSE without the thresholding procedure. 

The performance of HISSE, or AMOEBA, depends in large part upon the purity 
with respect to ground truth labels of the patches found by AMOEBA, which is 
influenced by the user defined "percent in fields" parameter in AMOEBA. In this 
experiment we defined the parameter as 50%; that is, we conservatively estimate 
that 50% of the pixels in the scene should be found in fields. By reducing the 
size of this parameter, we expect to produce a higher degree of patch purity 
and thus alleviate the problem of having a class represented by labeling patches 
which should not be patches at all. We hope that this will not aggravate another 



problem, namely that the ground truth map for segment 1618 shows a few large 
fields representing important classes (such as barley) in which no patches 
were found. 

Finally, we note that although the aggregated small grains acreage was 
very accurately estimated, the individual estimates for the various small grains 
classes (spring wheat, barley, oats, and millet) were not nearly as accurate. 
Indeed, several of the HISSE classes could not be given a single one of these 
labels, although they clearly represented small grains. Moreover, there was 
one significant crop class (beans) without a small grains label which was 
seriously underestimated. Thus, the usefulness of HISSE in a multicrop inventory 
cannot yet be determined. 
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PURE PIXEL PROPORTIONS^ ) 

K 


CLUSTER # 

AMOEBA ESTIMATE 

CLASS # 

HISSE ESTIMATE ( 

i 

' .054 

1 

.143 

2 

.136 

2 

.107 

3 

.259 

3 

.188 

4 

.101 

4 

.089 

5 

.109 

5 

.123 

6 

.171 
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.174 

7 

.067 

7 

.068 

8 

.021 

8 

.021 
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.034 

9 

.034 

10 

.001 

10 

.001 

11 

.031 

n 

.038 

12 

.003 
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13 

.012 

13 

.012 
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CLASS ACREAGE ESTIMATES 


CLASS 

ACREAGE 

CROP LABEL 
Small Grains 

1 

3764 

Beans 

Idle Fallow 

2 

1550 

Small Grains 

3 

3560 

Spring Wheat 

4 

1237 

Spring Wheat 

5 

2253 

Small Grains 

6 

3257 

Small Grains 

7 

1218 

Small Grains 

8 

262 

Spring Wheat 

9 

917 

Idle Cover Crop 

10 

4 • 

Flax 

11 

697 

Barley 

12 

49 

Homestead 

13 

171 

Trees 

C 

6124 

Contaminated Data 
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